Similarity-aware Query Processing and Optimization
نویسندگان
چکیده
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological applications, require or can significantly benefit from the identification and processing of similarities in the data. Even though some work has been done to extend the semantics of some operators, e.g., join and selection, to be aware of data similarities; there has not been much study on the role, interaction, and implementation of similarity-aware operations as first-class database operators. The focus of this thesis work is the proposal and study of several similarity-aware database operators and a systematic analysis of their role as query operators, interactions, optimizations, and implementation techniques. This work presents a detailed study of two core similarity-aware operators: Similarity Group-by and Similarity Join. We describe multiple optimization techniques for the introduced operators. Specifically, we present: (1) multiple non-trivial equivalence rules that enable similarity query transformations, (2) Eager and Lazy aggregation transformations for Similarity Group-by and Similarity Join to allow pre-aggregation before potentially expensive joins, and (3) techniques to use materialized views to answer similarity-based queries. We also present the main guidelines to implement the presented operators as integral components of a database system query engine and several key performance evaluation results of this implementation in an open source database system. We introduce a comprehensive conceptual evaluation model for similarity queries with multiple similarity-aware predicates, i.e., Similarity Selection, Similarity Join, Similarity Group-by. This model clearly defines the expected correct result of a query with multiple similarity-aware predicates. Furthermore, we present multiple transformation rules to transform the initial evaluation plan into more efficient equivalent plans.
منابع مشابه
Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملRelational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کاملRANK-AWARE QUERY PROCESSING AND OPTIMIZATION A Thesis
Ilyas, Ihab F. Ph.D., Purdue University, August, 2004. Rank-aware Query Processing and Optimization. Major Professors: Ahmed K. Elmagarmid and Walid G. Aref. This dissertation focuses on supporting ranking in relational database systems through a rank-aware query processing and optimization framework. We introduce ranking algorithms and operators to be adopted by current relational query engine...
متن کاملVOGUE: Towards A Visual Interaction-aware Graph Query Processing Framework
Due to the complexity of graph query languages, the need for visual query interfaces that can reduce the burden of query formulation is fundamental to the spreading of graph data management tools to wider community. We present a novel hci (human-computer interaction)-aware graph query processing paradigm, where instead of processing a query graph after its construction, it interleaves visual qu...
متن کامل